Julia Piaskowski
May 14, 2025
https://some-link-to-talk.html
\[Y_{ij} = X_i\beta + Z_ja + \epsilon_{ij} \]
\[\epsilon \sim N(0, \sigma^2 \mathbf{I_n}) \] \[a \sim N(0, \sigma_a^2) \]
\[ y_i|a_{j} \sim𝑁(x_i \beta + a_j, \sigma^2) \]
\[ y_i \sim𝑁(x_i \beta, \sigma^2) \]
Raw \[\epsilon_i = Y_i - \hat{Y_i} \]
(internally) Studentized:
\[\epsilon_i = \frac {Y_i - \hat{Y_i}} {\hat{s}}\]
Pearson/Scaled:
\[ \epsilon_i = \frac {Y_i - \hat{Y_i}} {sd(Y_i)} \]
Dependent variable: \(Y_{ij}\) (survived)
survival: \(p_i = Y_i/N\)
\[Y_{ij}|r_j \sim Binomial(N, \pi_{ij})\]
\[\eta_{ij} = \eta + \alpha_i + r_j\] (\(\eta\) = mean, \(\alpha_i\) = treatment, \(r_j\) = grouping variable)
\[\eta_{ij} = log(\frac{\pi_{ij}} {1 - \pi_{ij}} )\]
Different distributional assumptions and mathematical mean/variance relationships of some distributions require a different approach
Exception: log-linear models because the identity link function is used.
For \(y_1,...,y_n\) responses:
\[y_i \sim \mathcal{P}(\mu_i, \phi)\]
CDF: \[F(y; \mu, \phi)\]
For a continuous \(F\), \(F(y_i; \mu_i, \phi)\) are uniformly distributed, and the quantile residuals are:
\[ r_{i} = \Phi^{-1} \left\{F(y_i; \hat{\mu_i}, \hat{\phi} )\right\}\]
Dunn KP, and GK Smyth (1996). Randomized quantile residuals. J of Comp & Graph Stats 5, 1-10.
Survival: \(y_i \sim Exp(\mu_i)\)
\[ \text{log }\mu_i = \beta_0 + \beta_1 \text{ log } x_i \]
Quantile residuals:
\[ r_{i} = \Phi^{-1} \left\{ 1-exp(y_i/\hat{\mu_i}) \right\}\]
\[r_i = \Phi^{-1}(u_i)\] \[ u_i \sim Uniform(a_i, b_j]\] \[a_i = lim_{y \uparrow y_i} F(y; \hat{\mu_i}, \hat{\phi})\]
\[ b_i = F(y_i; \hat{\mu_i}, \hat{\phi}) \]
\[y_i \sim binomial(n, p_i) \]
\[ \text{logit} (p_i) = \beta_0 + \beta_1x_i \] \[r_i = \Phi^{-1}(u_i)\]
\[ u_i \sim Uniform(a_i, b_j] \]
\[a_i = lim_{y \uparrow y_i} F(y; n, \hat{p_i}) =\displaystyle\sum_{i=0}^{\lfloor k-1 \rfloor} \begin{pmatrix} n \\ i \end{pmatrix} p_i(1-p)^{n-i}\]
\[b_i = F(y_i; n, \hat{p_i}) =\displaystyle\sum_{i=0}^{\lfloor k \rfloor} \begin{pmatrix} n \\ i \end{pmatrix} p_i(1-p)^{n-i}\]
First submitted to CRAN in February, 2003.
Randomization is used to produce continuously distributed residuals when the response is discrete or has a discrete component. This means that the quantile residuals will vary from one realization to another for a given data set and fitted model.
(Diagnostics for HierArchical Regression Models)
\[\mathbf{Y = X\beta+Za}\] \[ \mathbf{Y|a\sim G(\mu, R)} \] linear predictor: \(\mathbf{\eta = X\beta}\)
fitted value: \(E(Y) = \mu\)
Link function: \(\eta = g(\mu)\)
For each observation in the data set:
\[ Y_i \sim G(\hat{\mu_i}, \hat{\phi}) \]
\[ r_{i} = F(y_i; \hat{\mu_i}, \hat{\phi} )\]
\[ r_{ij} \sim Uniform(0,1) \]
\(r_{ij} = 0\): everything is larger
\(r_{ij} = 1\): everything is smaller
\(r_{ij} = 0.5\): right in the middle
DHARMa runs 250 simulations (per observation) by default, they recommend up to 1000
\[ (\hat{Y_i} = \lambda = 5,; \quad Y_i = 7 )\]
Create a data set with a random effect (“group”, 10 levels), a covariate (“Enviroinment1”) and a Poisson-distributed response variable (“observedResponse”):
Analyze correctly and incorrectly:
Correctly Specified Model
Incorrectly Specified Model
In Progress
(focus on lme4::glmer() and glmmTMB)
(Implemented in DHARMa)
Bates DM, Maechler M, Bolker BM and S Walker (2015). “Fitting Linear Mixed-Effects Models Using lme4.” Journal of Statistical Software, 67(1), 1-48. doi:10.18637/jss.v067.i01.
Brooks ME, Kristensen K, van Benthem KJ, Magnusson A, Berg CW, Nielsen A, Skaug HJ, Maechler M and BM Bolker (2017). “glmmTMB Balances Speed and Flexibility Among Packages for Zero-inflated Generalized Linear Mixed Modeling.” The R Journal, 9(2), 378-400. doi: 10.32614/RJ-2017-066.
Dunn KP, and GK Smyth (1996). Randomized quantile residuals. Journal of Computational and Graphical Statistics 5, 1-10.
Feng et al (2017). Randomized quantile residuals: an omnibus model diagnostic tool with unified reference distribution. arXiv https://doi.org/10.48550/arXiv.1708.08527
Gelman A and J Hill (2007). Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge University Press, New York.
Hartig F (2022). DHARMa: Residual Diagnostics for Hierarchical (Multi-Level/Mixed) Regression Models. R package version 0.4.6, https://CRAN.R-project.org/package=DHARMa.
Pinheiro JC and DM Bates (2000). Mixed-Effects Models in S and S-PLUS. Springer Verlag, New York.
Stroup WW, Ptukhina M and J Garai (2024). Generalize Linear Mixed Models: Modern Concepts, Methods and Applications. CRC Press, Boca Raton.
(G)LMMs are hard - harder than you may think based on what you may have learned in your second statistics class….